home *** CD-ROM | disk | FTP | other *** search
-
-
-
- rrrreeeeggggeeeexxxxpppp((((3333TTTTccccllll)))) rrrreeeeggggeeeexxxxpppp((((3333TTTTccccllll))))
-
-
-
- NNNNAAAAMMMMEEEE
- regexp - Match a regular expression against a string
-
- SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
- rrrreeeeggggeeeexxxxpppp ?_s_w_i_t_c_h_e_s? _e_x_p _s_t_r_i_n_g ?_m_a_t_c_h_V_a_r? ?_s_u_b_M_a_t_c_h_V_a_r _s_u_b_M_a_t_c_h_V_a_r ...?
-
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- Determines whether the regular expression _e_x_p matches part or all of
- _s_t_r_i_n_g and returns 1 if it does, 0 if it doesn't.
-
- If additional arguments are specified after _s_t_r_i_n_g then they are treated
- as the names of variables in which to return information about which
- part(s) of _s_t_r_i_n_g matched _e_x_p. _M_a_t_c_h_V_a_r will be set to the range of
- _s_t_r_i_n_g that matched all of _e_x_p. The first _s_u_b_M_a_t_c_h_V_a_r will contain the
- characters in _s_t_r_i_n_g that matched the leftmost parenthesized
- subexpression within _e_x_p, the next _s_u_b_M_a_t_c_h_V_a_r will contain the
- characters that matched the next parenthesized subexpression to the right
- in _e_x_p, and so on.
-
- If the initial arguments to rrrreeeeggggeeeexxxxpppp start with ---- then they are treated as |
- switches. The following switches are currently supported:
-
- ----nnnnooooccccaaaasssseeee Causes upper-case characters in _s_t_r_i_n_g to be treated as lower |
- case during the matching process.
-
- ----iiiinnnnddddiiiicccceeeessss Changes what is stored in the _s_u_b_M_a_t_c_h_V_a_rs. Instead of storing |
- the matching characters from ssssttttrrrriiiinnnngggg, each variable will contain|
- a list of two decimal strings giving the indices in _s_t_r_i_n_g of |
- the first and last characters in the matching range of |
- characters.
-
- -------- Marks the end of switches. The argument following this one |
- will be treated as _e_x_p even if it starts with a ----.
-
- If there are more _s_u_b_M_a_t_c_h_V_a_r's than parenthesized subexpressions within
- _e_x_p, or if a particular subexpression in _e_x_p doesn't match the string
- (e.g. because it was in a portion of the expression that wasn't matched),
- then the corresponding _s_u_b_M_a_t_c_h_V_a_r will be set to ``----1111 ----1111'' if ----iiiinnnnddddiiiicccceeeessss
- has been specified or to an empty string otherwise.
-
-
- RRRREEEEGGGGUUUULLLLAAAARRRR EEEEXXXXPPPPRRRREEEESSSSSSSSIIIIOOOONNNNSSSS
- Regular expressions are implemented using Henry Spencer's package
- (thanks, Henry!), and much of the description of regular expressions
- below is copied verbatim from his manual entry.
-
- A regular expression is zero or more _b_r_a_n_c_h_e_s, separated by ``|''. It
- matches anything that matches one of the branches.
-
-
-
-
-
-
- PPPPaaaaggggeeee 1111
-
-
-
-
-
-
- rrrreeeeggggeeeexxxxpppp((((3333TTTTccccllll)))) rrrreeeeggggeeeexxxxpppp((((3333TTTTccccllll))))
-
-
-
- A branch is zero or more _p_i_e_c_e_s, concatenated. It matches a match for
- the first, followed by a match for the second, etc.
-
- A piece is an _a_t_o_m possibly followed by ``*'', ``+'', or ``?''. An atom
- followed by ``*'' matches a sequence of 0 or more matches of the atom.
- An atom followed by ``+'' matches a sequence of 1 or more matches of the
- atom. An atom followed by ``?'' matches a match of the atom, or the null
- string.
-
- An atom is a regular expression in parentheses (matching a match for the
- regular expression), a _r_a_n_g_e (see below), ``.'' (matching any single
- character), ``^'' (matching the null string at the beginning of the input
- string), ``$'' (matching the null string at the end of the input string),
- a ``\'' followed by a single character (matching that character), or a
- single character with no other significance (matching that character).
-
- A _r_a_n_g_e is a sequence of characters enclosed in ``[]''. It normally
- matches any single character from the sequence. If the sequence begins
- with ``^'', it matches any single character _n_o_t from the rest of the
- sequence. If two characters in the sequence are separated by ``-'', this
- is shorthand for the full list of ASCII characters between them (e.g.
- ``[0-9]'' matches any decimal digit). To include a literal ``]'' in the
- sequence, make it the first character (following a possible ``^''). To
- include a literal ``-'', make it the first or last character.
-
-
- CCCCHHHHOOOOOOOOSSSSIIIINNNNGGGG AAAAMMMMOOOONNNNGGGG AAAALLLLTTTTEEEERRRRNNNNAAAATTTTIIIIVVVVEEEE MMMMAAAATTTTCCCCHHHHEEEESSSS
- In general there may be more than one way to match a regular expression
- to an input string. For example, consider the command
-
- rrrreeeeggggeeeexxxxpppp ((((aaaa****))))bbbb**** aaaaaaaabbbbaaaaaaaaaaaabbbbbbbb xxxx yyyy
-
- Considering only the rules given so far, xxxx and yyyy could end up with the
- values aaaaaaaabbbbbbbb and aaaaaaaa, aaaaaaaaaaaabbbb and aaaaaaaaaaaa, aaaabbbb and aaaa, or any of several other
- combinations. To resolve this potential ambiguity rrrreeeeggggeeeexxxxpppp chooses among
- alternatives using the rule ``first then longest''. In other words, it
- considers the possible matches in order working from left to right across
- the input string and the pattern, and it attempts to match longer pieces
- of the input string before shorter ones. More specifically, the
- following rules apply in decreasing order of priority:
-
- [1] If a regular expression could match two different parts of an input
- string then it will match the one that begins earliest.
-
- [2] If a regular expression contains |||| operators then the leftmost
- matching sub-expression is chosen.
-
- [3] In ****, ++++, and ???? constructs, longer matches are chosen in preference
- to shorter ones.
-
-
-
-
-
-
- PPPPaaaaggggeeee 2222
-
-
-
-
-
-
- rrrreeeeggggeeeexxxxpppp((((3333TTTTccccllll)))) rrrreeeeggggeeeexxxxpppp((((3333TTTTccccllll))))
-
-
-
- [4] In sequences of expression components the components are considered
- from left to right.
-
- In the example from above, ((((aaaa****))))bbbb**** matches aaaaaaaabbbb: the ((((aaaa****)))) portion of the
- pattern is matched first and it consumes the leading aaaaaaaa; then the bbbb****
- portion of the pattern consumes the next bbbb. Or, consider the following
- example:
-
- rrrreeeeggggeeeexxxxpppp ((((aaaabbbb||||aaaa))))((((bbbb****))))cccc aaaabbbbcccc xxxx yyyy zzzz
-
- After this command xxxx will be aaaabbbbcccc, yyyy will be aaaabbbb, and zzzz will be an empty
- string. Rule 4 specifies that ((((aaaabbbb||||aaaa)))) gets first shot at the input string
- and Rule 2 specifies that the aaaabbbb sub-expression is checked before the aaaa
- sub-expression. Thus the bbbb has already been claimed before the ((((bbbb****))))
- component is checked and ((((bbbb****)))) must match an empty string.
-
-
- KKKKEEEEYYYYWWWWOOOORRRRDDDDSSSS
- match, regular expression, string
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- PPPPaaaaggggeeee 3333
-
-
-
-